What Cannot be Learned with Bethe Approximations

نویسندگان

Uri Heinemann

Amir Globerson

چکیده

We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. We show that there exists a regime of empirical marginals where such Bethe learning will fail. By failure we mean that the empirical marginals cannot be recovered from the approximated maximum likelihood parameters (i.e., moment matching is not achieved). We provide several conditions on empirical marginals that yield outer and inner bounds on the set of Bethe learnable marginals. An interesting implication of our results is that there exists a large class of marginals that cannot be obtained as stable fixed points of belief propagation. Taken together our results provide a novel approach to analyzing learning with Bethe approximations and highlight when it can be expected to work or fail. Probabilistic graphical models [8, 23] are a powerful tool for describing complex multivariate distributions. They have been used successfully in a wide range of fields, from computational biology to machine vision and natural language processing. To use such a model in practice, one typically needs to solve two related tasks. The first is the inference task which involves calculating probabilities of events under the model. The second task involves learning the parameters of the model from empirical data. Unfortunately, in many models of interest the inference problem is computationally hard, and cannot be solved exactly in practice. This has motivated extensive research into approximate inference schemes, some of which have been quite successful empirically. Perhaps the most well known of these is the belief propagation (BP) algorithm, which is closely related to variational approximations based on Bethe free energies [26]. Another variational approach, which uses convex free energies is the tree-reweighted (TRW) method [22]. Although the TRW approach results in convex optimization problems for inference, it sometimes yields marginals that are inferior to those obtained by BP (e.g., see [9]). How should one learn the parameters of a model when inference is intractable? The typical approach to parameter learning is likelihood maximization, but when inference is intractable it is also hard to maximize the likelihood. Because of this difficulty, many methods have been devised to approximate the learning problem. One elegant approach is to approximate the likelihood using the same variational approximation that is employed during inference [5, 14, 16, 19]. Analyzing the performance of approximate learning schemes is challenging, since even the accuracy of the underlying variational approximations is hard to analyze. Furthermore, we do not generally expect the learned model to be similar to the one obtained using exact maximum likelihood. One approach, which has recently been introduced by Wainwright [19] is to use the notion of moment matching. In exact maximum likelihood learning, the learned model has a nice property: some if its marginals are guaranteed to be identical to those of the empirical data. This property is often referred to as moment matching. Wainwright [19, 21] has shown that when using convex variational approximations such as TRW, the learned model also has the moment matching property in the following sense: if one applies approximate inference to it (using the same variational approach that was used during learning), the resulting marginals will be equal to When the data are known to be generated by a graphical model of the same structure, pseudo-likelihood [1] can be used and is consistent. However, this assumption is rarely met in practice, and pseudo-likelihood often does not perform well in these cases. the empirical ones. However, these results cannot be applied to learning with Bethe approximations, since the latter are not convex. Because of the success of Bethe approximations in a wide array of applications, it is important to understand the advantages and limitations of learning with those. This is precisely the goal of our work. It may initially seem like learning with Bethe approximations would also result in a moment matching property. In other words, if we use Bethe approximations during both learning and inference, our learned model will agree with the empirical marginals. However, as we show here, the situation is considerably more complex. In the current work we provide some surprising results with respect to moment matching and Bethe approximations, that shed light on the performance of learning with such approximations, and on properties of the BP algorithm. Our main results are: • We show that there exist empirical distributions for which Bethe approximations cannot perform moment matching. In other words, if we run BP on the optimal Bethe parameters, we will not recover the empirical marginals. Such empirical distributions are thus bad inputs for Bethe approximations, since the learned parameters cannot be used to reconstruct the original marginals. • We provide inner and outer bounds on the set of marginals for which Bethe moment matching is possible, and show that they agree with empirical behavior of Bethe learning. Surprisingly, we show that binary attractive models cannot be learned with Bethe approximations for certain graphs. • Our results also provide a novel characterization of BP fixed points. Specifically, we show that there is a large class of marginals that cannot be obtained as stable fixed points of BP. Taken together, our results provide a novel way of analyzing learning with Bethe approximations. 1 Maximum Likelihood in Graphical Models We focus on pairwise Markov random fields for simplicity. That is, we consider random variables X1, . . . , XNV and pairwise functions θij(xi, xj) corresponding to edges E in a graph G with NV nodes. The MRF corresponding to these parameters is given by: p(x;θ) = 1 Z(θ) exp ∑ ij∈E θij(xi, xj) + NV ∑

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bethe and Related Pairwise Entropy Approximations

For undirected graphical models, belief propagation often performs remarkably well for approximate marginal inference, and may be viewed as a heuristic to minimize the Bethe free energy. Focusing on binary pairwise models, we demonstrate that several recent results on the Bethe approximation may be generalized to a broad family of related pairwise free energy approximations with arbitrary count...

متن کامل

A Modified Energy Balance Method to Obtain Higher-order Approximations to the Oscillators with Cubic and Harmonic Restoring Force

This article analyzes a strongly nonlinear oscillator with cubic and harmonic restoring force and proposes an efficient analytical technique based on the modified energy balance method (MEBM). The proposed method incorporates higher-order approximations. After applying the proposed MEBM, a set of complicated higher-order nonlinear algebraic equations are obtained. Higher-order nonlinear algebra...

متن کامل

Static correlation beyond the random phase approximation: dissociating H2 with the Bethe-Salpeter equation and time-dependent GW.

We investigate various approximations to the correlation energy of a H2 molecule in the dissociation limit, where the ground state is poorly described by a single Slater determinant. The correlation energies are derived from the density response function and it is shown that response functions derived from Hedin's equations (Random Phase Approximation (RPA), Time-dependent Hartree-Fock (TDHF), ...

متن کامل

Inference in Boltzmann Machines, Mean Field, TAP and Bethe Approximations

متن کامل

Bethe Free Energy and Contrastive Divergence Approximations

Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models needed to solve such problems become larger and more complicated. As a result performing inference and l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

What Cannot be Learned with Bethe Approximations

نویسندگان

چکیده

منابع مشابه

Bethe and Related Pairwise Entropy Approximations

A Modified Energy Balance Method to Obtain Higher-order Approximations to the Oscillators with Cubic and Harmonic Restoring Force

Static correlation beyond the random phase approximation: dissociating H2 with the Bethe-Salpeter equation and time-dependent GW.

Inference in Boltzmann Machines, Mean Field, TAP and Bethe Approximations

Bethe Free Energy and Contrastive Divergence Approximations

عنوان ژورنال:

اشتراک گذاری